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U  is  a  subtree.   If  C  has  c  nodes  then  it  has  c(k-l)+2  outgoing  edges.   The  claim 
now  follows. 

• 

CORROLARY  4 . 3    (i)   The  maximal   size   of  a   component  of  an  open  k-tree  with 

perimeter  p  is  0(p/(k-l)). 

(ii)  A  minimal  (",p)  partition  of  an  open  k-tree  of   size  M  has  0(M(k-l)/p) 

components. 

PROOF:   The  lower  bound  follows  from  the  previous  lemma.   A  straightforward 
partition  into  subtrees  achieves  the  upper  bound. 


5.  I2-NETW0RKS 

fl-networks  have  been  introduced  by  Lawrie  [6]  as  interconnection  networks 
for  parallel  processing.  Many  other  networks  considered  in  the  literature  are 
topologically  equivalent  to  fi-networks.  This  includes  the  Flip  (Staran)  network, 
the  indirect  binary  cube  network,  the  baseline  network,  and  the  SW-banyan  network 
with  2x2  switches  [11, A].  The  graph  describing  the  Discrete  Fourier  Transform 
algorithm  also  has  the  same  topology.  Our  results  are  valid  for  each  of  these 
networks.  They  can  be  also  applied  to  Benes  networks  which  essentially  consist 
of  two  ^-networks  joined  end  to  end  [5], 


s 
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We  represent  for  convenience  ^-networks  as  directed  graphs.   The  network  Q 

m 

has   M  =  2^"  input  ports  I.  ...^  ,  2^^  output  ports  0     ^  .  and  m  columns  of  2"'-! 

i    m  1  •  *  •  m 

2  input,  2  output  internal  nodes  S^  ^ _ ^^  _     where  a.=0,l  and  l<j<m.   The  edges 
of  "^  connect   the   input   port   Iaj...a^  ^°  ^he   node  S^^ _ ^^  _. ^ ;  the  node 

'^l-'-^m-lSJ  '°  '^"  ""'^^^  'e,ai...a^.2;j+l'  ^=°'^'  ^"'^  ^^^^  "^'^^   '^1- •  .«n,-l  ;n>  ^° 
the  output  ports  Oe^„^_^^_^,  3=0,1.   The  network  ^^3  is  illustrated  in  Fig.   2. 

Each  internal  node  of  an  J2-network  has  the  same  number  of  incoming  edges  as 
of  outgoing  edges.  A  "flow  conservation"  argument  shows,  therefore,  that  if  U  is 
a  set  of  internal  nodes  in  an  fl-network,  half  of  the  edges  in  the  boundary  of  U 
are  incoming  edges  and  half  of  the  edges  in  the  boundary  are  outgoing  edges. 

™°^^  ^-^  ^^^  "  b^  ^  «^^  °f  "°des  in  Q^  of  size  s  with  p  incoming  edges. 
Then  s  <  ^(plgp). 

PROOF:  The  claim  is  true  for  fl^,  which  contains  one  node.  Assume  it  is  true 
^°^  ^m-1-  ^°^^  ^^^^  the  graph  of  Q^  consists  of  one  stage  containing  2™"-^ 
switches  followed  by  two  isomorphic  copies  ^2°  and  n^  of  n^_^;  input  port  are 
connected  to  the  switches  in  the  first  stage,  and  each  node  in  that  stage  is 
connected  to  Q^  g^d  to  Q^  (i.e.  identified  with  one  input  port  from  Q^  and  to 
one  input  port  from  fil)  ~  see  [11]  or  [5]  for  a  proof  and  Fig.  3  for  an 
illustration. 

Let  U^  be  the  subset  of  nodes  of  U  which  are  internal  to  f2^,  let  U^  be  the 
subset  of  nodes  of  U  which  are  in  the  first  stage  of  Q  ,  and  let  s-  =  |UJ|,  for 
j=0,l,2.  Let  Pj  be  the  number  of  incoming  edges  to  U  which  are  incident  to  a 
node  in  UJ,  and  let  q^  be  the  number  of  edges  connecting  nodes  in  U^  to  nodes   in 
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ABSTRACT 

Partitions  of  regular  interconnection  networks  used  for  multiprocessing  are 
investigated.  Bounds  on  the  number  of  components  as  a  function  of  the  number  of 
connections  between  components  are  given.  The  relation  between  the  existence  of 
partitions  for  networks  and  their  ability  to  support  efficiently  data  motions  is 
examined. 
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1.  INTRODUCTION 

The  fast  advance  in  microelectronics  has  fostered  interest  in  new 
architectures  suited  to  this  technology.  In  particular,  much  research  has  been 
devoted  to  the  design  of  systems  consisting  of  large,  regular  networks  of 
identical  computation  nodes.  The  advance  in  VLSI  has  also  motivated  a  large 
number  of  papers  on  the  theoretical  constraints  of  VLSI  technology.  In 
particular,  the  area  required  to  realize  different  types  of  regular  communication 
graphs  have  been  investigated. 

The  silicon  area  required  to  realize  a  large  computational  system  frequently 
exceeds  the  area  of  one  chip.  The  network  has  to  be  packaged  into  several 
components.  The  cost  of  multichip  systems  is  best  reflected  by  the  number  of 
components  rather  than  by  the  sum  of  their  area.  The  "component  number" 
complexity  of  a  network  is  therefore  no  less  important  than  its  area  complexity. 

The  number  of  components  needed  to  realize  a  given  network  is  affected  by 
two  constraints.  The  "area"  of  a  component,  i.e.  number  of  nodes  that  can  be 
packed  on  a  component,  is  restricted  by  the  physical  size  of  the  chip.  The 
"perimeter"  of  a  component,  i.e.  number  of  lines  leaving  that  component,  is 
restricted  by  the  pin  count  of  the  chip.  The  last  constraint  becomes 
increasingly  important  with  the  advance  in  VLSI  technology.  The  pin  count  of 
VLSI  chips  increase  more  slowly  than  the  gate  count  since  the  size  of  external 
wires  does  not  scale  down  at  the  same  rate  as  the  internal  feature  size.  Also, 
there  is  an  increasing  disparity  between  the  speed  of  internal  lines  and  the 
speed  of  external  lines.   This  limits  the  use  of   time-multiplexing  to  overcome 
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the   pin  count   constraint.    It   is   therefore  important  to  study  the  effect  of 
"perimeter"  constraints  on  the  partionability  of  interconnection  graphs. 

Such  a  study  is  undertaken  in  this  paper,  we  characterize  the  "area"  to 
"perimeter"  relationship  for  families  of  common  graphs  such  as  grides,  trees,  and 
shuffle-exchange  graphs.  We  proceed  next  to  relate  the  partitionability  of 
graphs  to  their  ability  to  support  efficiently  data  motions,  and  show  that 
networks  which  are  efficient  for  routing  do  not  partition  well. 


2.  DEFINITIONS 

We  represent  a  network  by  a  triple  N  =  <V,P,E>,  where  <VUP,E>  is  an 
undirected  multigraph  (multiple  edges  between  nodes  are  allowed).  V  is  the  set 
of  (internal)  nodes  of  N,  E  is  the  set  of  edges  of  N,  and  P  is  the  set  of  ports 
of  N.  Ports  will  be  used  to  represent  external  connections  to  the  network.  We 
say  that  the  network  is  closed  if  P  =  0,  open  otherwise. 

Let  U  be  a  subset  of  V.  We  define  the  boundary  of  U,  to  be  the  set  of  edges 
connecting  nodes  from  U  to  nodes  outside  U.  The  size  s(U)  of  U,  is  the  number  of 
nodes  in  U,  and  the  perimeter  p(U)  of  U  is  the  number  of  edges  in  the  boundary  of 
U. 

n  is  a  partition  of  the  network  N  if  it  is  a  partition  of  the  internal  nodes 

of  N.  A  partition  of  N  is  an  (s,p)-partition  if  each  component  has  size  at  most  s 

and  perimeter  at  most  p.  We  can  associate  with  each  partition  II  of  the  network  N 

a  network  n(N)  defined  as  follows:  The  ports  of  II(N)  are   the  ports   of   N;   the 
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nodes   of  n(N)  are  the  components  of  the  partition;  and  the  edges  of  II(N)  are  the 
edges  of   N   connecting   nodes   in   different   components.    Thus,   N  admits   an 
(s  ,p)-partition  with  k  components   iff  there  exist  a  graph  homomorphism  f:N->-N' 
such  that 

1.  f  defines  a  one  to  one  correspondence  between  the  ports   of   N  and   the 
ports  of  N'. 

2.  N'  has  k  internal  nodes. 

3.  For  each  internal  node  v  of  N'  |f~^(v)|  <  s  and  degree(v)  <  p. 


3.  MESH-CONNECTED  NETWORKS 

A  d-dimensional  mesh-connected  network  of  size  M  =  m°  consists  of  the  nodes 

<0'2...a^>,   1  <  a-j^  <  m.   The   node   <a]^...ajj>   is   connected   to   the   nodes 

^Cj*  •  •°'i-l' • '"d^*  ^  ^  J  '*'^>  where  such  node  exists.  In  an  open  mesh-connected 
network  the  nodes  at  the  boundary  of  the  mesh   (i.e  nodes   <oii...aj>   such   that 

Q'^  =  1  or  aj^  =  m  for  some  i)  are  assumed  to  be  ports.  Of  particular  interest  are 
2-dimensional  grids  which  have  simple  planar  layouts.  With  such  layout  the 
perimeter  of  a  set  of  nodes  is  related  to  the  perimeter  of  a  surface,  whereas  the 
number  of  nodes  in  the  set  is  related  to  its  area.  We  can  therefore  bound  the 
maximal  size  of  a  set  as  a  function  of  its  perimeter. 

We  recall  the  following  facts  from  planar  geometry. 

FACT   1.   The   largest  area   of  a  surface  circumscribed  by  a  curve  of  fixed 
length  is  achieved  by  a  circle.   Thus,  the  area  A  of  a  surface  is  related  to  its 
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perimeter  P  by  the  inequality  A  <  P^/Ati.  The  same  inequality  is  still  valid  when 
the  surface  is  the  union  of  several  connected  components. 

FACT  2.  The  largest  area  A  of  a  surface  circumscribed  by  a  straight   segment 
and  a  curve  of  length  P  is  achieved  by  a  half-circle,  and  is  P'^/Zir. 

FACT   3.   The   largest  area  A  of  a  surface  circumscribed  by  two  orthogonal 
segments  and  a  curve  of  length  P  is  achieved  by  a  quadrant,  and  is  P  /it. 

THEOREM  3.1  Let  U  be  a  subset  of  internal  nodes  in  a  2-dimensional  open 
mesh-connected  network.   Then  s(U)  =  0(p(U)'^). 

PROOF:  We  represent  the  network  by  a  square  grid,  with  nodes  at  unit  distances. 
We  associate  to  each  node  P  in  U  a  unit  square  centered  at  P.  The  union  of  these 
squares  forms  a  surface  S  with  area  s(U)  and  perimeter  p(U)  (Fig.  1).  The  claim 
now  follows  from  fact  1. 


CORROLARY  3.2  A  minimal  (°°,p)-partition  of  a  2-dimentional  open  mesh  connected 
network  of  size  M  has  9(M/p^)  components. 

PROOF ;  The  lower  bound  follows  from  the  previous  theorem.  An  (s,p)-partition 
of  a  2-dimensional  mesh-connected  network  into  components  of  size  0(p  )  is  simply 
achieved  by  by  partitioning  into  subsquares  with  side  p. 


The  same  argument   can  be  carried  through  for  d-dimensional  mesh-connected 
networks,  d  >  2,  by  considering  regular  grids  in  d-dimensional  space.   We  obtain: 
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THEOREM  3.3    (i)  The  maximal  size  of  a  component  with  perimeter  p   in  an  open 
d-dimensional  mesh-connected  networks  is  Qip^'  ^°~'-^) . 

(ii)   A  minimal  (°°,p)  partition  of  a  d-dimensional  open  mesh  connected  network  of 
size  M  has  0(M/p"' ^""■'■^ )  components. 

Note  that  the  constant  implicit  in  the  0  notation  depends  on  d. 

The  previous  asymptotic  results  are  valid  for  closed  mesh-connected  networks 
as  well.  Let  N  be  a  closed  2-dimensional  mesh-connected  network  of  size  M  and 
let  U  be  a  connected  subset  of  nodes  of  size  s  <  aM,  for  some  constant  a  <  1,  and 
perimeter  p.  If  each  connected  component  of  the  complement  of  U  is  supported  by 
at  most  two  sides  of  N  then  we  have  (using  fact  3)  M  -  s  <  p  /^f,  so  that 


2 
(l-a)M  <  P_,  and 


s  <  aM  <   ."  .p^. 
Ti(l-a) 


If  the  boundary  of  U  reaches  accross  two  opposite  sides  of  N  then  we  have 
p  >  m1/2j  and  s  <  p^.  Otherwise  U  is  supported  by  at  most  two  sides  of  N,  so 
that 


s  <  p2/Tr. 


These  inequalities  imply  the  following  theorem. 


THEOREM  3. A    (i)  Let  a  <  1  be  a  fixed  constant.   The  maximal  size  of  a  component 
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of   perimeter  p  and  size  <  aM  in  a  2-dimentional  mesh-connected  closed  network  of 
size  M  is  0(p^). 

(ii)  Let  a  <  1  be  a  fixed  constant.   Then  a  minimal  (aM,p)-partition  of  a   closed 
2-dimensional  mesh-connected  network  of  size  M  has  0(M/p^)  components. 

This  result  generalizes  to  higher  dimensions. 

The  same  lower  bound  argument  is  valid  for  hexagonally  connected  networks, 
or  in  general  for  any  network  that  can  be  embedded  in  d-dimensional  space  such 
that  the  distance  between  two  nodes  is  at  least  1,  and  the  length  of  each  edge  is 
at  most  c,  for  some  constant  c  which  is  independent  of  the  network  size. 


A.  REGULAR  TREES 

A  complete  k-tree  of  depth  d^  consists  of  the  nodes  <ai..,a^>,  where 
0  <  j  <  d  and  1  <  a^  <  k.  The  node  <a-^...a^>  is  connected  to  the  node 
<aj^...a.  2>  (its  parent)  and  to  the  nodes  <a^. .  .a -a -^^^  (its  children),  if  such 
exist.  The  tree  contains  (k'^'^^-l)/(k-l)  nodes  and  k^  leaves.  In  an  open  k-tree 
the  leaves  are  taken  to  be  ports,  and  an  additional  port  connected  to  the  root  is 
added. 

LEMMA  4 . 1  A  closed  complete  k-tree  with  M  nodes  has  an  (s ,p)-partition  with 
0(M/s)  components,  for  any  p  >  2. 

PROOF:  Let  d  be  the  depth  of  the  tree.  Then  M  =  (k^^+^-O/Ck-l)  and  the  tree 
has  k"^  leaves.  We  assume  w.l.g  that  s  =  (k'^"'"l)/(k-l) ,  so  that  a  complete  k-tree 
of  size  s  has  depth  r  and  k^  leaves.   The  r  bottom  levels   of   the   tree  can  be 
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partitioned  into  k^-r  =  )  ,{    ^,    subtrees  of  weight   s.   The   remaining      ~ ^ 

(k-l;s+l  (k-1 )s+l 

nodes   can  be  partitioned  into  0(M/kps)  subtrees,  each  with  p-1  outgoing  edges  at 
the  leaves  and  one  outgoing  edge  at  the  root. 


The  last  result  is  asymptotically  optimal.  It  implies  that  perimeter 
restrictions  do  not  affect  significantly  the  number  of  components  in  a  partition 
of  a  closed  tree.  The  components  containing  the  nodes  at  the  upper  levels  of  the 
tree  contain  very  few  nodes,  whereas  nodes  at  the  lowest  levels  can  be  densely 
packed  into  subtrees  with  only  one  outgoing  connection.  As  the  number  of  nodes 
increases  exponentially  per  level,  it  is  sufficient  to  partition  efficiently  the 
lowest  levels. 

Note  that,  as  pointed  by  [1],  it  is  possible  to  partition  a  binary  tree  into 
identical  components  of  size  2^^  and  perimeter  4.  Each  component  contains  a 
subtree  of  size  2'^-!  of  nodes  from  the  bottom  r-1  levels  of  the  tree,  and  one 
node  from  the  higher  levels;  it  has  one  outgoing  edge  from  the  root  of  the 
subtree  and  three  outgoing  edges  from  the  remaining  node.  This  idea  can  be 
generalized  to  yield  optimal  (k^,2k)-partitions  of  closed  k-trees  into  identical 
components. 

The  situation  is  completely  different  when  we  turn  to  open  trees.   We  have: 

LEMMA  A. 2   Let  U   be  a   set   of   internal   nodes   in   a   k-tree.    Then 
p(U)  >    (k-l)s(U)  +  2. 

PROOF :    The  number  e  of  leaves  in  a  subtree  of  a  k-tree  is  related  to  the  number 
i  of  internal  nodes  by  the  relation  e  =  i(k-l)+l.   Each  connected  component  C  of 
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U  is  a  subtree.   If  C  has  c  nodes  then  it  has  c(k-l)+2  outgoing  edges.   The  claim 
now  follows. 


CORROLARY  4 . 3  (i)   The  maximal   size   of  a   component   of  an  open  k-tree  with 

perimeter  p  is  0(p/(k-l)). 

(ii)  A  minimal  (°°,p)  partition  of  an  open  k-tree   of   size  M  has   0(M(k-l)/p) 
components . 

PROOF:   The   lower  bound  follows  from  the  previous  lemma.   A  straightforward 
partition  into  subtrees  achieves  the  upper  bound. 


5.  i2-NETW0RKS 

f2-networks  have  been  introduced  by  Lawrie  [6]  as  interconnection  networks 
for  parallel  processing.  Many  other  networks  considered  in  the  literature  are 
topologically  equivalent  to  ^-networks.  This  includes  the  Flip  (Staran)  network, 
the  indirect  binary  cube  network,  the  baseline  network,  and  the  SW-banyan  network 
with  2x2  switches  [11,A].  The  graph  describing  the  Discrete  Fourier  Transform 
algorithm  also  has  the  same  topology.  Our  results  are  valid  for  each  of  these 
networks.  They  can  be  also  applied  to  Benes  networks  which  essentially  consist 
of  two  J^-networks  joined  end  to  end  [5]. 
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We  represent  for  convenience  fl-networks  as  directed  graphs.   The  network  Q. 

m 

has   M  =  2™  input  ports  I^     „  >  2™  output  ports  0        ,  and  m  columns  of  2^~ 

2  input,  2  output  internal  nodes  S         .  .,  where  a.  =  0,l  and  l<j<m.   The  edges 

"l'*'°'m-l»J         -^ 

of  n^     connect   the   input   port   I„    „   to   the   node  S„        .  •■  ;  the  node 

o    n        . -i-i-i  >  3=0,1;  and  the  node   S„  to 


S„     ^    . ^  to  the  nodes  Sc 
°'l---°'m-l' J  ^ 


the 


output  ports  Op.         ,  3=0,1.   The  network  J2o  is  illustrated  in  Fig,   2. 


Each  internal  node  of  an  ^-network  has  the  same  number  of  incoming  edges  as 
of  outgoing  edges.  A  "flow  conservation"  argument  shows,  therefore,  that  if  U  is 
a  set  of  internal  nodes  in  an  J2-network,  half  of  the  edges  in  the  boundary  of  U 
are  incoming  edges  and  half  of  the  edges  in  the  boundary  are  outgoing  edges. 

THEOREM  5.1   Let  U  be  a  set  of  nodes  in  ^     of  size   s  with  p   incoming  edges. 
Then  s  <  ^(plgp). 

PROOF :  The  claim  is  true  for  Ji,,  which  contains  one  node.  Assume  it  is  true 
for  ^  ]^.  Note  that  the  graph  of  Q^  consists  of  one  stage  containing  2™ 
switches  followed  by  two  isomorphic  copies  ^^  and  ^  of  ^m-p  input  port  are 
connected  to  the  switches  in  the  first  stage,  and  each  node  in  that  stage  is 
connected  to  ^^  g^d  to  Q^  (i.e.  identified  with  one  input  port  from  OO  and  to 
one  input  port  from  ^^)  —  see  [11]  or  [5]  for  a  proof  and  Fig.  3  for  an 
illustration. 

Let  U^  be  the  subset  of  nodes  of  U  which  are  internal  to  ^^ ,  let  U^  be  the 
subset  of  nodes  of  U  which  are  in  the  first  stage  of  H  ,  and  let  s-  =  lU^I,  for 
j=0,l,2.  Let  p.  be  the  number  of  incoming  edges  to  U  which  are  incident  to  a 
node  in  UJ,  and  let  q  be  the  number  of  edges  connecting  nodes  in  U   to  nodes  in 
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U^ .   The  number  of  incoming  edges  to  U^  is  Pg.+qg.,  and  we  have   by   the  inductive 
assumption 


'e  "  y'-t'e 'He/'-^6>^f e 'Me- 


s.  <  ^(Pe+qe)lg(pe+qe),  e=0,l, 


Since   each  node  in  U^  is  connected  to  two  input  ports  and  to  one  node  in  each  of 
the  subnetworks  ^^   and  Q^   we  have  P2  =  2s2,  and  q^.  <  S2,  e=0,l,  so  that 


jPl   ^   ^e»  ^=0.1« 


Thus 


S  =  Sg  +  Sj  +  S2 


*^  yCpo+qo^ig^Po'^qo)  +  •2-(Pi+qi)ig(Pi+qi)  +^2 


^  7^P0'V2^-'-2^P0'V2^  ■*■  (Pl+2P2^1s(Pi+-2P2)  "•"  -^2' 


The  right  hand  side  of  this   inequality  is  maximized  when  Pn  ~  Pi  ~  (p~P2)/2. 
Thus 

1  P-P2   1      P~P2   1 


s<  2ti.(^  +  ^2)l8(^  +  ^2)J  +^: 


1  1  P  ^  1 
^IgJ  +  ^2 


•jPlgP  +  ■2-(P2"P) 
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<   1   T 


A  similar  result  is  proven  is  [A]. 

CORROLARY  5.2  (i)  The  maximal  size  of  a  component  of  an  J2-network  with 
perimeter  p  is  0(plgp). 

(ii)  A  minimal  («',p)-partition  of  an  f2-network  with  M  nodes  has  0(M/plgp) 
components. 

PROOF:  The  lower  bound  follows  from  the  last  theorem  and  the  remark  that  the 
number  of  incoming  edges  to  a  component  equals  to  the  number  of  outgoing  edges 
from  that  component.  A  corresponding  upper  bound  can  be  obtained  using  a 
partition  of  the  J2-network  suggested  by  Lawrie  [6].  If  U  is  a  set  of  2"^  internal 
nodes  in  the  same  stage  of  the  J^-network,  with  addresses  differing  only  in  the 
lowest  k  bits  then  in  each  of  the  subsequent  k  stages  of  the  network  the  nodes  in 
U  have  exactly  2^  descendents.  Thus,  if  k  divides  m,  we  can  partition  the  nodes 
of  Q.  into  (m/k)2™~'^  subsets  containing  k2^~-^  nodes,  each  set  with  2*^  incoming 
lines  and  2^  outgoing  lines.  Such  set  will  contain  2'^~-'-  consecutive  nodes  in  one 
column,  and  their  successors  in  the  next  k-1  columns  -  see  Fig.   4. 

Note  that  the  connections  between  components  will  follow  the  same  pattern  as 
in  an  f2-network  of  degree  2^:  An  fi-network  of  degree  d  has  d™  input  ports,  d™ 
output  ports  and  m  columns  containing  each  d™~^  internal  nodes  with  indegree  and 
outdegree  d.  The  nodes  are  labelled,  and  the  connections  (edges)  are  defined  as 
for  a  usual  (2x2)  ^-network,  with  addresses  being  given  to  base  d  rather  than  in 
binary  notation. 
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In   the  general  case  (k  does  not  divide  n)  it  is  still  possible  to  partition 
^   into  no  more  than  |n/k|2™   subsets,  each  with  perimeter  at  most  2 


6.  SHUFFLE-EXCHANGE  NETWORKS 

The  shuffle-exchange  network  SE   consists  of  2™  nodes  <a,...a  >  and  3 '2™" 
°  m  1    m 

edges.  Node  "^'^  i  •  •  •  o'ni'*  ■'^^  connected  to  node  <a2...a^,aj>  (shuffle  connection), 
node  ^°'ii,»°'l' •  •"m-l'^  (unshuffle  connection),  and  node  <a2^...ajjj>  (exchange 
connection).   We  have  drawn  SEo  in  Fig.   5. 

For   convenience  we   define   two   related  networks.   The   doubly  linked 

shuffle-exchange  network  DSE   is  obtained  from  the  shuffle-exchange   network  ES 
—  m  ^  m 

by  replacing  each  exchange  edge  with  two  edges.  The  4-pin  shuffle-exchange 
network  [2]  4SE^  is  obtained  by  merging  in  SE^j^^^^  pairs  of  nodes  differing  in  the 
last  address   bit   only.    It  consists  of  2™  nodes,  each  incident  to  four  edges. 

Node  '^°'i'»«0'^>  is  connected  to  the  nodes  <B,a2...a  ^^  ^"d  <°'2' *  •°'m»^''»  ^°^ 
B  =  0,1.   We   shall   assume  that  the  edges  of  the  4-pin  shuffle-exchange  network 

are   directed   in  the  unshuffle   direction,    i.e.    connect   <a,...a  >   to 

J.    m 

<3  ,o(^. .  .a^jj_2>.  This  last  network  is  closely  related  to  the  f2-network:  It  is 
obtained  from  an  fl-network  by  coalescing  together  nodes  in  successive  columns 
that  have  the  same  address  within  the  column.  We  can  therefore  apply  the  results 
obtained  for  fi-networks  to  4-pin  shuffle-exchange  networks. 

THEOREM  6.1   Let  a,  B  and  Y  be  constants  such  that  a  >  0,   6  <  1  and  Y  <  l/2e. 
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Let  U  be  a  set   of   nodes   in  '^SE   of   size   s  <  y2™  and   perimeter  p,   where 
in°'  <  p  <2^'".   Then  s  =  O(plgp). 

PROOF :  Since  each  node  has  two  incoming  edges  and  two  outgoing  edges,  there  are 
p/2  edges  entering  U  and  p/2  edges  leaving  U.  Let  U'  be  the  set  of  nodes  in  the 
fi-network  Q  defined  by  U'  =  {  S„  o  • -i  =  <^}--'^m>  ^  U}.  The  set  U'  contains 
ms  nodes  and  has  at  most  2s+(m-l)p/2  incoming  edges.  It  follows,  by  Th.  5.1, 
that 


ms  <  ^2s+(m-l)p/2)lg(2s+(m-l)p/2].  (5.1) 


Let 


F(s)  =  ^2s+(m-l)p/2jlg[2s+(m-l)p/2j  -  ms .  (5.2) 


F(s)  is  decreasing  for 


s  <  s^^^   =  2"'-Ve  -  (m-l)p/4,  (5.3) 


increasing  afterward,  and  it  has  at  s  .   a  minimum  value  of 

mm 


F(s^j^^)  =  m(m-l)p/2  -  I'^/ie   ln2).  (5.4) 

From  the  choice  of  3  and  Y  it  follows  that  for  m  large  enough  s  .   >  y2™  and 

^      ^   mm    ' 

F(s^^^)  <  0.  It  follows  that  F(s)  has  a  unique  root  Sq  in  the  interval  [0,s^.;^]  , 
and  that  inequality  (5.1)  is  satisfied  by  s  in  the  range  0  <  s  <y2"  only  if 
s  <  Sq.   We  shall  now  estimate  the  root  sq. 
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Let 


P   ^   m(ni-l)p 
0   4sQ+p(m-l)' 


Substituting  for  Sq  in  the  equation  F(sq  )  =  0,  we  obtain  the  equation 


Cq  -   IgOQ  =  m  -  lg[m(m-l)p/2j.  (5.6) 


Inequality  5.3  implies  that 


^     ^         m(m-l)p    ^  em(m-l)p 
0   7i— +iU^  "   2"^1 


Substituting  back  in  equation  5.6  we  obtain 

Oq  =  m  -  lg[m(m-l)p/2j  +  Igog  (5.7) 

>  m  -  lg[m(m-l)p/2j  +  lg[  em(m-l  )p/2nJ+l  J 
=  Ige. 
Thus  IgOg  >  0,  so  that  substituting  again  in  equation  5.7  we  obtain 

Oq  >  m  -  lg[m(m-l)p/2j 
Substituting  back  for  Sq  we  obtain 
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SQ  =  -^-pdn-DC.— .-1) 


0 


«  y4)(m-l)( — , — I — J 1  .  ,^  1-1) 


^  p(m-l)(lgm  +  Ig(m-l)  +  Igp  -1) 
4(ni  -  Igm  -  Ig(m-l)  -Igp  +  1) 


^  p(m-l)(lgm  +  Ig(m-l)  +  Igp  -1) 
4C(l-li)m  -  Igm  -  IgCm-l)  +1) 


=  O(plgp)) 


CORROLARY  6.2  The  claim  of  theorem  6.1  is  valid  for  subsets  of  (i)  the  doubly 
linked  shuffle-exchange  network,  and  (ii)  of  the  shuffle-exchange  network. 
PROOF:  (i)  Let  U  be  a  set  of  nodes  in  DSE^.  Assume  that  the  node  <ai...anj>  is 
in  U  whereas  the  node  <a^...5^>  is  not  in  U.  Then  the  set  U'  =  U  U  {<ai...a^>} 
contains  one  more  node  and  no  more  outgoing  edges  than  U.  It  follows  that  we  can 
assume  w.l.g.  that  <a^. .  .a^^.^  ,0>  e  U  iff  <a2 . .  .a^,_;^ ,  1>  e  U.  The  claim  now 
follows  immediatly  from  Th.   6.1. 


(ii)  If  U  is  a  set  of  weight  s  and  perimeter  p  in  SE  then  the  corresponding 
set  of  nodes  in  DSE^  has  weight  s  and  perimeter  <  2p.  The  claim  follows 
therefore  from  (i).  • 


CORROLARY  6.3   Let  a  >  0  and   B  <  1   be   fixed   constants.    If   p  >  n"  then  a 
(2P™,p)-partition  of  the  shuffle-exchange  network  SE  has  12(2™/plgp)  components. 
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This  corrolary  will  be  sligthly  strenghened  in  Cor.   7.5. 


7.  PERMUTATIONS  AND  PARTITIONING 

The  fi-network  network  belongs  to  a  class  of  networks  that  support 
efficiently  many  permutations  since  they  are  rich  in  edge-disjoint  paths  leading 
from  the  inputs  to  the  outputs  of  the  network.  We  shall  define  in  that  section 
this  class  more  precisely  and  show  that  no  network  in  this  class  can  be 
partitioned  more  efficiently  than  the  fi-network. 

Let  N  =  <V,P,E>  be  a  network.  We  shall  assume  that  the  edges  of  N  are 
directed,  and  that  P  is  the  disjoint  union  of  a  set  I  of  input  ports  with  no 
incoming  edges  and  a  set  0  of  output  ports  with  no  outgoing  edges.  We  say  that  a 
mapping  f:I^O  is  realizable  by  the  network  N  if  it  contains  edge  disjoint  paths 
connecting  simultaneously  each  input  i  to  the  output  f(i).  Let  I  =  {in*  *  *  ♦■'■m-1^ 
be  the  m  inputs  of  the  network  N  and  0  =  {0|^,..,o  _i}  be  the  n  ouputs  of  the 
network.  A  family  {f  },  r  =  l,..,n,  of  mappings  is  transitive  if  each  input  i  is 
mappped  onto  the  n  different  outputs  in  0  by  the  n  mappings.  For  example  if 
m  =  n  the  cyclic  shifts  {f^}  defined  by  fj.(i-;)  =  o.^^.  ^^^  ^  form  a  transitive 
family.  A  network  is  transitive  if  it  can  realize  a  transitive  family  of 
mappings.  This  class  of  networks  is  similar  to  the  shifting  graphs  of  [8]  (the 
difference  being  that  we  consider  edge  disjoint  paths  rather  than  node  disjoint 
paths).   The  following  theorem  is  the  analogue  of  theorem  2.2.1  proven  there. 

THEOREM  7 . 1   Let  N  be  a  transitive  network  with  m  inputs  and   n  outputs  where 
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each  node  has  outdegree  (indegree)   bounded   by  d.   Then  N  contains  at  least 

(iiilgn)/(dlgd)  internal  nodes. 

PROOF ;    Let   k  be   the   number   of   internal  nodes.   The  number  of  edges  in  the 

network  is  no  larger  than  dk.   Let  d(j,k)  be  the  length  of  a  shortest   path  from 

input   i.   to  output  o^.   Consider  a  directed  tree  T  of  shortest  paths  connecting 

the  input  i.  to  each  of  the  outputs  0]^,..,o^.   This  tree  has   degree  <  d+1.    It 

follows  that  the  external  path  length  of  this  tree  is  no  larger  than  the  external 

path  length  of  a  complete  d-ary  tree  with  n  leaves,  so  that 

fd(j,k)  >    (nlgn)/(lgd). 
k 

Let  {f  },  r  =  1 n,  be  a  transitive  class  of  mappings   realizable   on  N, 

and  let  L  be  the  total  length  of  the  paths  realizing  the  mapping  f^^.   Then 


n   m    , 
2L_  >   Z,   Z^  d(i,f  (i))  =  ZI  d(i,j)  >   mn(logn)/lgd. 
r  ^   r=l  1=1      '^       13 


Therefore  there  exist  some  r  such  that  L  >  (mlgn)/(lgd) .  Since  each  edge 
occurs  at  most  once  on  the  paths  realizing  f  we  obtain  that  dk  >  (mlgn)/(lgd) , 
or  k  >  (mlgn)/(dlgd). 

• 

For  any  d  >  1  an  ^-network  of  degree  d  can  realize  all  the  cyclic  shifts 
[6],  and  is  therefore  transitive.  Since  an  ^-network  of  degree  d  with  n  inputs 
have  (nlgn)/(dlgd)  internal  nodes,  it  follows  that  the  last  lower  bound  is  tight 
for  any  d,  and  that  J2-networks  are  optimal  transitive  networks. 

COROLLARY  7.2   An  (<»,p)-partition  of  a  transitive  network  N  with  m  inputs  and   n 
outputs  contains  at  least  (mlgn)/(plgp)  components. 
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PROOF:    An   ( s ,p)-partition  of   N  corresponds   to  an  embedding   of  N  into  a 
transitive  netv«7ork  N'  that  has  m  inputs  and  n  outputs,  where  each   internal   node 
has   degree  at  most   p.   Thus,  the  number  of  internal  nodes  of  N',  which  is  the 
number  of  components  of  the  partition,  is  bounded  from  below  by  (mlgn)/(plgp) . 


The  last  result  implies  in  particular  cor.  5.2  on  partitions  of  fl-networks 
(but  not  theorem  5.1). 

A  similar  result  can  be  obtained  for  closed  networks.  The  relevant  property 
here  is  the  time  required  to  permute  data  items  that  are  stored  at  the  nodes  of 
the  network.  It  is  well  known  that  if  each  node  of  a  shuffle-exchange  network 
stores  one  data  item  then  any  permutation  of  the  items  can  be  executed  in  O(lgn) 
routing  steps  [10].  We  shall  show  that  no  network  with  the  same  property  can  be 
better  partitioned  than  the  shuffle-exchange. 

Let  G  =  <V,E>  be  the  (directed)  multigraph  of  a  closed  network.  If  the  node 
u  is  connected  to  the  node  v  by  an  edge  then  an  item  can  be  transmitted  from  u  to 
V  in  one  time  unit.  Each  node  can  receive  simultaneously  each  time  unit  one  item 
on  each  incoming  edge,  and  send  one  item  on  each  outgoing  edge.  All 
communications  are  synchronous.  We  assume  that  m  items  are  stored  at  the  n  nodes 
of  G,  such  that  item  j  is  at  node  v(j).  The  graph  G  realizes  the  permutation 
oeS  in  time  T  if  each  item  j  can  be  simultaneously  routed  to  node  v(a(j))  in  T 
steps.  A  family  {f^},  i  =  l,..,m  of  permutations  is  transitive  if  each  item  j  is 
mapped  onto  the  different  items  l,..,m  by  the  m  permutations. 

THEOREM  7.3   Let  N  be  a   closed   network  with  n  nodes,   each  with  outdegree 
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(indegree)  bounded  by  p.  Assume  that  m  items  are  stored  at  the  nodes  of  N  so  that 

each  node  contains  at   most   s   items.   Let  {o .}      be  a   transitive   family  of 

permutations   on  these  items.  Then  there  exist  a  permutation  in  that  family  that 

cannot  be  realized  in  less  than  "'  ^^^'    '    time. 

nplgp 


PROOF ; 


Let  {o^}  be  a  transitive  family  of  permutations  on  {l,...,m}  that  can  be 


performed   in  time  T,   and   let   d(i,j)   be  the  distance  between  v(i),  the  node 

containing  item  i,  and  v(j).   The  internal  path  length  of  a  tree  with  k  nodes   of 

degree  <  p   is  fi(klgk/lgp).   The  sum  LdCi.j)  of  the  distances  from  node  v(i)  to 

3 
each  node  v(j)  storing  element  j  is  minimized  if  the  elements  are   stored  in  a 

mimimal   depth  tree   of   degree   p  rooted  at  v(i) ,  with  s  elements  at  each  node. 

Thus 


Zd(i,j)  =  n(s[(m/s)lg(m/s)/lgpj)  =  a["^g(°'/s)j, 
J  -LgP 


Let  c(i)  be  the  number  of  individual  routing  steps  required  to  perform  the 
permutation  o^.  We  clearly  have 


c(i)  >   2.d(j,ai(j)). 


On   the  other  hand  at  most  np  routing  steps  can  be  performed  at  any  one  cycle,  so 
that 

c(i)  <  Tnp. 

Summing  up  over  all  the  permutations  we  obtain 


Tmnp  >   Zc(i)  >   ZZd(j,o.(j))  =  ZEd(i,j)  =  ^[^J^!^lll] , 
i       ij     ^       ij  Igp 
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so  that 


^  _  Qf  mlg(ni/s)  j 
nplgp 


The  last  result  implies  in  particular  that  a  network  of  fixed  degree  that 
stores  one  item  at  each  of  its  M  nodes  cannot  realize  each  permutation  from  a 
family  of  transitive  permutations  in  less  than  J2(lgM)  steps.  A  similar  result 
appears  in  [3]. 

COROLLARY  7 . A  Let  N  be  a  closed  network  with  M  nodes  that  realizes  in  time  T  a 
family  of  transitive  permutations  on  items  stored  one  at  each  node  of  N.  Then  any 
(s ,p)-partition  of  N  has  at  least  (M  lg(M/s)/(Tplgp)  components.  In  particular, 
if  T  =  O(lgM)  and  s  =  0(MP)  for  fixed  3  <  1,  then  the  number  of  components  is 
f2(M/(plgp)). 

PROOF :  An  (s,p)-partition  of  N  into  m  components  corresponds  to  a  mapping  of  N 
onto  a  closed  network  N'  that  has  m  nodes,  has  degree  bounded  by  p,  and  can 
realize  a  transitive  family  of  permutations  on  M  items  stored  at  most  s  at  each 
node  in  time  T.  Thus,  by  the  last  theorem,  T  >    (Mlg(M/s)/(mplgp) . 

• 

COROLLARY  7.5  Let  B  <  1  be  a  fixed  constant.  Then  any  (2^",p)  partition  of  the 
shuffle-exchange  network  has  ^(2"/plgp)  components. 

PROOF :  Follows  from  last  corollary  and  the  fact  that  any  permutation  can  be 
realized  on  a  shuffle-exchange  network  in  logarithmic  time. 
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8.  CONCLUSION 

It  is  well  known  from  studies  in  VLSI  complexity  that  for  many  networks  the 
area  required  for  a  planar  implementation  of  the  network  grows  faster  than 
linearily  in  the  number  of  nodes.  Thus,  an  increasing  fraction  of  the  chip  area 
is  occupied  by  connections  rather  than  by  active  gates.  The  results  of  this 
paper  suggest  another  limitation  that  will  diminish  the  returns  from  increased 
miniaturization:  the  off-chip  bandwidth  bottleneck.  While  this  problem  is 
acknowledged  by  practitioners  since  long,  apparently  it  has  not  motivated  many 
theoretical  studies. 

The  results  brought  in  this  paper  raise  several  interesting  questions.  The 
perimeter  to  size  relation  seems  to  be  a  useful  graph-theoretical  concept, 
measuring  in  a  certain  sense  the  "cohesiveness"  of  the  graph.  It  would  be 
interesting  to  investigate  its  relation  to  other  graph-theoretical  functions, 
such  as  bisection  width  [9],  and  with  the  existence  of  small  separators  for  the 
graph  [7]. 

Many  of  our  results  on  partitions  seems  similar  to  the  results  on  the  area 
required  to  for  a  planar  layout  of  graphs.  What  is  the  precise  relation  between 
these  two  complexity  measures? 

The  "cohesiveness"  of  a  graph  seems  to  be  an  important  property:  A  graph 
which  has  no  components  of  large  size  and  small  perimeter  is  "bottleneck-free". 
We  have  shown  that  graphs  which  have  a  good  worst  case  performance  on  routing 
must  be  "bottleneck-free",  and,  therefore,  cannot  be  partitioned  into  components 
with  few  outgoing  edges.   The  same  relationship  holds  if   the   efficiency  of  an 
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interconnection  network  is  measured  by  the  average  case  performance  on  routing. 
We  do  not  know  if  the  converse  implication  holds  true:  Is  it  always  possible  to 
route  in  logarithmic  time  in  a  graph  where  components  of  size  k  have  perimeter  of 
size  f2(k/lgk)?  If  not,  can  one  give  sufficient  conditions  for  a  graph  to  support 
efficiently  a  transitive  family  of  permutations? 

All  the  lower  bounds  given  in  this  paper  are  matched  by  corresponding  upper 
bounds,  with  one  exception:  the  shuffle-exchange  network.  It  is  easy  to  find 
components  of  the  shuffle-exchange  network  with  perimeter  0(p)  and  size  f^Cplgp), 
for  any  p.  Thus,  the  lower  bound  given  in  Cor.  6.2  is  optimal.  These 
components,  however,  do  not  form  a  partition  of  the  shuffle-exchange  network  that 
matches  the  lower  bound  of  Cor.   6.3. 
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Figure  1 
Subset  of  2D  grid 
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Figure   2 
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Figure  3 
Recursive  splitting  of  Si, 
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Figure   A 
Partition  of   5^^ 
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Figure   5 
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